Skip to content

rustc_const_eval: Expose APIs for signalling foreign accesses to memory #141391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nia-e
Copy link
Contributor

@nia-e nia-e commented May 22, 2025

This PR will allow Miri to internally update its state based on information about foreign accesses performed on its memory during FFI. Necessary as part of rust-lang/miri#4326 to make use of the extra information we gain; currently pending review of the design (see design document in the linked PR), so marked as draft for now.

r? @RalfJung

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 22, 2025
@rust-log-analyzer

This comment has been minimized.

@nia-e nia-e marked this pull request as ready for review May 31, 2025 07:34
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 31, 2025

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

The Miri subtree was changed

cc @rust-lang/miri

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

Comment on lines +1041 to +1046
pub fn apply_accesses(
&mut self,
mut ids: Vec<AllocId>,
reads: Vec<std::ops::Range<u64>>,
writes: Vec<std::ops::Range<u64>>,
) -> InterpResult<'tcx> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what this function is supposed to do, but it seems to be doing the work of finding the allocation that needs to be adjusted? That logic can entirely live inside Miri; Miri is in control of picking the absolute addresses for all memory so it can do this easily. In fact it can do it more efficiently since it has a list of all allocations sorted by their absolute address.

I think the only change you need inside rustc is a version of prepare_for_native_write that takes a range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, I recall running into some roadblock trying to do this within Miri but I don't see it now when looking over it again so it might be fine to move this as you suggested. Ty!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I tried to move it and realised - the culprit is get_alloc_raw() which is private and wholly inside of rustc. As far as I can tell, it being private is also why prepare_for_native_call() was inside rustc and not Miri

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and get_alloc_raw really should stay private or else we'll have more bugs like #142575...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, we could still expose something that does get_alloc_raw+prepare_for_native_write.

/// Initialise previously uninitialised bytes in the given range, and set provenance of
/// everything in it to `Wildcard`. Before calling this, make sure all provenance in this
/// range is exposed!
pub fn mark_foreign_write(&mut self, range: AllocRange) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to skip resetting unit memory to 0. That step must not be skipped!

We shouldn't need two operations here anyway. Just one operation, prepare_for_native_write with a range, that does all the things it used to do, but restricted to a range.

///
/// The allocations in `ids` are assumed to be already exposed.
pub fn prepare_for_native_call(&mut self, ids: Vec<AllocId>) -> InterpResult<'tcx> {
pub fn prepare_for_native_call(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if paranoid is false, then this function does what exactly? It seems to do just absolutely nothing.^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This zeroes out the bytes of uninitialised memory without actually marking it as init. I initially didn't do this, but it resulted in mark_foreign_write overwriting the data we cared about with zeroes. That's also why the latter doesn't zero out anything. We first zero the memory, then call the foreign code, then without re-zeroing mark it as init if it was written to after being zeroed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does it do that zeroing? prepare_for_native_write only gets called if paranoid is true. So what you say and what the code does do not seem to line up, or I am misunderstanding something.

But also, I think we shouldn't have such a 2-stage approach. This seems easier to reason about if we just fully delay everything until the memory gets accessed the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I explained myself poorly, sorry. Calling get_alloc_raw() has a side effect, namely (by calling get_global_alloc() which calls adjust_global_allocation() which calls init_allocation()) it, well, actually initialises that allocation. The point is that if we skip this, calling get_global_alloc() post-FFI might "initialise" allocations that were actually written to by the foreign code that Miri doesn't know about. I confirmed this with testing; if we allocate a pointer and pass it across the FFI boundary but skip calling prepare_for_native_call(false), the data written will be replaced with zeroes as soon as get_global_alloc() is called after the FFI call has completed. So, as far as I can tell init_allocation() is responsible for said zeroing the first time it's called for a specific allocation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no zeroing anywhere outside prepare_for_native_write so I don't know what you are talking about.

adjust_global_allocation will set up the memory of the static with whatever the initial value of the static is. Is that what you mean? That's not "zeroing" though unless the initial value of the static happens to be zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point that we need to actually initialize the globals at some point. Why can't we do this lazily on first access, like the other FFI adjustments are done with your approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjust_global_allocation will set up the memory of the static with whatever the initial value of the static is

Ah, that would explain why only tests that had to do with statics had trouble here. Every time I ran into that problem the memory was all zeroes, so I incorrectly assumed that's what it was doing I guess. I should probably update this then, apologies for my confusion

It's a good point that we need to actually initialize the globals at some point. Why can't we do this lazily on first access, like the other FFI adjustments are done with your approach?

I'm not sure how to do that I guess, since if the op was a read instead of a write we need it to already have been initialised or else the foreign code will read uninitialised data, no? We only know an access happened 1 instruction after it did happen so I think we still have to be cautious here and initialise the globals. I might be missing something though

@RalfJung
Copy link
Member

RalfJung commented Jun 9, 2025

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 9, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants